Connectivity Inference in Mass Spectrometry Based Structure Determination
نویسندگان
چکیده
We consider the following Minimum Connectivity Inference problem (MCI), which arises in structural biology: given vertex sets Vi ⊆ V, i ∈ I, nd the graph G = (V,E) minimizing the size of the edge set E, such that the sub-graph of G induced by each Vi is connected. This problem arises in structural biology, when one aims at nding the pairwise contacts between the proteins of a protein assembly, given the lists of proteins involved in sub-complexes. We present four contributions. First, using a reduction of set cover, we establish that MCI is APX-hard. Second, we show how to solve the problem to optimality using a mixed integer linear programming formulation (MILP). Third, we develop a greedy algorithm based on unionnd data structures (Greedy), yielding a 2(log2 |V |+ log2 κ)-approximation, with κ the maximum number of subsets Vi a vertex belongs to. Fourth, application-wise, we use the MILP and the greedy heuristic to solve the aforementioned connectivity inference problem in structural biology. We show that the solutions of MILP and Greedy are more parsimonious than those reported by the algorithm initially developed in biophysics, which are not quali ed in terms of optimality. Since MILP outputs a set of optimal solutions, we introduce the notion of consensus solution. Using assemblies whose pairwise contacts are known exhaustively, we show an almost perfect agreement between the contacts predicted by our algorithms and the experimentally determined ones, especially for consensus solutions. Key-words: Connectivity Inference Connected induced sub-graphs, network design, APX-hard, Mixed integer linear program, Greedy algorithm, Mass spectrometry, Protein assembly, Structural biology, Biophysics, Molecular machines ∗ INRIA Sophia-Antipolis Méditerranée † Univ. Nice Sophia Antipolis, CNRS, I3S, UMR 7271, 06900 Sophia Antipolis, France ‡ Correspondence to [email protected] or to [email protected] ha l-0 08 37 49 6, v er si on 1 22 J un 2 01 3 Inférence de la connectivité pour la détermination de structure en spectrométrie de masse Résumé : Nous considérons le problème d'Inférence de Connectivité Minimale (Minimum Connectivity Inference ou MCI) qui se pose en biologie structurale: étant donnés des ensembles de sommets Vi ⊆ V, i ∈ I, trouver le graphe G = (V,E) minimisant la taille de l'ensemble des arêtes E, de telle sorte que le sous-graphe de G induit par chaque ensemble Vi soit connexe. Ce problème se pose en biologie structurale pour la determination des contacts plausibles entre les protéines d'un assemblage à partir des listes de protéines présentes dans des sous-complexes. Nous présentons quatre contributions. Premièrement, nous montrons que le problème MCI est APX-hard en utilisant une réduction de set cover. Deuxièmement, nous présentons une formulation en programme linéaire mixte (MILP) permettant de résoudre MCI de façon optimale. Troisièmement, nous proposons un algorithme glouton (Greedy) basé sur des structures de données Union-Find. Nous montrons que cet algorithme est une 2(log2 |V | + log2 κ)-approximation de l'optimal, où κ est le nombre maximum d'ensembles Vi contenant un sommet donné. Quatrièmement, d'un point de vue appliqué, nous utilisons l'approche MILP et l'algorithme glouton pour résoudre le problème MCI en biologie structurale. Nous montrons que les solutions calculées par MILP et Greedy sont plus parcimonieuses que celles produites par l'algorithme utilisé à ce jour en bio-physique lequel n'est pas quali é en terme d'optimalité. Les algorithmes MILP et Greedy générant des ensembles de solutions, nous introduisons la notion de solution consensus. En utilisant le cas d'assemblages dont les contacts sont connus de façon exhaustive, nous montrons un accord presque parfait entre les contacts determinés par nos algorithmes et ceux determinés expérimentalement, en particulier pour les solutions consensus. Mots-clés : Inférence de la connectivité, Sous-graphe induit connexe, APX-hard, programme linéaire mixte, algorithme glouton, spectrométrie de masse, assemblage protéique, biologie structurale, biophysique, machine moléculaire ha l-0 08 37 49 6, v er si on 1 22 J un 2 01 3 Minimum Connectivity Inference 3
منابع مشابه
Post - doctoral position . Improving inference algorithms for macromolecular structure determination
Biological phenomena are based on assemblies of bio-molecules, whose properties depend on structural and dynamic features of their subunits. Experimental studies of such systems face limitations, typically yielding high resolution models of subunits, or low resolution information for the whole assembly. This project focuses on combinatorial algorithms for the connectivity inference problem for ...
متن کاملUnveiling Contacts within Macromolecular Assemblies by Solving Minimum Weight Connectivity Inference (MWC) Problems.
Consider a set of oligomers listing the subunits involved in subcomplexes of a macromolecular assembly, obtained e.g. using native mass spectrometry or affinity purification. Given these oligomers, connectivity inference (CI) consists of finding the most plausible contacts between these subunits, and minimum connectivity inference (MCI) is the variant consisting of finding a set of contacts of ...
متن کاملPerchloroethylene Analysis by Chemical Oxidation and Determination of Intermediate Products by Gas Chromatography, Mass Spectrometry
Abstract Background and Objective: Perchloroethylene (PCE) is a chlorinated hydrocarbon used as a solvent in many industrial processes. In contaminated water and soil a great deal of PCE is found. This study aimed to determine the rate of decomposition of PCE occurred after advanced oxidation. Material and Methods: In this descriptive-analytic study conducted (2011) in public health faculty...
متن کاملDevelopment and Application of a Validated Liquid Chromatography-Mass Spectrometry Method for the Determination of Dexchlorpheniramine Maleate in Human Plasma
A convenient liquid chromatographic-single Quadrupole mass spectrometric (LC-MS) method was developed and validated for dexchlorpheniramine maleate (INN name: chlorphenamine) determination in human plasma. The need for just a single liquid-liquid extraction with ethyl acetate and being highly sensitive were the advantages of this method. The linearity was also excellent over the range of 1...
متن کاملRelative Determination Approach to the Metabolites of Protoberberine Alkaloids in Rat Urine by Liquid Chromatography Tandem Mass Spectrometry for the Comparative Studies on Rhizome coptidis and Zuojinwan Preparation
The lack of authentic standards has limited the quantitative analysis of herbal drugs in biological samples. The present work demonstrated a practicable strategy for the assay of herbs and their metabolites independent of authentic standards. A liquid chromatography–electrospray ionization–mass spectrometry (LC–ESI–MS) method for the qualitative and quantitative determination of the metabolites...
متن کاملStrategy for profiling and structure elucidation of mucin-type oligosaccharides by mass spectrometry.
A strategy combining accurate mass determination, tandem mass spectrometry, structure homology, and exoglycosidases is described that allows the structural characterization of mucin-type O-linked oligosaccharides. The method is used to profile with quantitation the O-linked oligosaccharide (both neutral and anionic) components of the only diploid Xenopus frog, Xenopus tropicalis. Collision-indu...
متن کامل